{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<p align=\"center\">\n",
    "    <img src=\"https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true\" width=\"220\" height=\"240\" />\n",
    "\n",
    "</p>\n",
    "\n",
    "## Subsurface Data Analytics \n",
    "\n",
    "## Interactive Demonstration of LASSO Regression and Intro to Hyperparameter Tuning\n",
    "\n",
    "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n",
    "\n",
    "In the Fall of 2019 my students requested a demonstration to show the value of LASSO regression. I wrote this interactive demonstration to show cases in which the use of regularization coefficient, a hyperparameter, that reduces the model flexibilty / sensivity to training data (reduces model variance) improves the prediction accuracy.  \n",
    "\n",
    "### PGE 383 Exercise: Interactive Demonstration of LASSO Regression and Intro to Hyperparameter Tuning\n",
    "\n",
    "Let's start by introducing linear regression, expanding to LASSO regression with a regularization coefficient with feature selection, and then explain the 2 interactive demonstrations in this notebook (we had a 2 for 1 sale this week!).\n",
    "\n",
    "#### Linear Regression\n",
    "\n",
    "Linear regression for prediction.  Here are some key aspects of linear regression:\n",
    "\n",
    "**Parametric Model**\n",
    "\n",
    "* the fit model is a simple weighted linear additive model based on all the available features, $x_1,\\ldots,x_m$.\n",
    "\n",
    "* the parametric model takes the form of: \n",
    "\n",
    "\\begin{equation}\n",
    "y = \\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0\n",
    "\\end{equation}\n",
    "\n",
    "**Least Squares**\n",
    "\n",
    "* least squares optimization is applied to select the model parameters, $b_1,\\ldots,b_m,b_0$ \n",
    "\n",
    "* we minize the error, residual sum of squares (RSS) over the training data: \n",
    "\n",
    "\\begin{equation}\n",
    "RSS = \\sum_{i=1}^n (y_i - (\\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0))^2\n",
    "\\end{equation}\n",
    "\n",
    "* this could be simplified as the sum of square error over the training data, \n",
    "\n",
    "\\begin{equation}\n",
    "\\sum_{i=1}^n (\\Delta y_i)^2\n",
    "\\end{equation}\n",
    "\n",
    "**Assumptions**\n",
    "\n",
    "* **Error-free** - predictor variables are error free, not random variables \n",
    "* **Linearity** - response is linear combination of feature(s)\n",
    "* **Constant Variance** - error in response is constant over predictor(s) value\n",
    "* **Independence of Error** - error in response are uncorrelated with each other\n",
    "* **No multicollinearity** - none of the features are redundant with other features \n",
    "\n",
    "#### Other Resources\n",
    "\n",
    "This is a tutorial / demonstration of **Linear Regression**.  In $Python$, the $SciPy$ package, specifically the $Stats$ functions (https://docs.scipy.org/doc/scipy/reference/stats.html) provide excellent tools for efficient use of statistics.  \n",
    "I have previously provided this example in R and posted it on GitHub:\n",
    "\n",
    "1. R https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.R\n",
    "2. Rmd with docs https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.Rmd \n",
    "3. knit as an HTML document(https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.html) \n",
    "\n",
    "#### LASSO Regression\n",
    "\n",
    "With the lasso we add a hyperparameter, $\\lambda$, to our minimization, with a shrinkage penalty term.\n",
    "\n",
    "\\begin{equation}\n",
    "\\sum_{i=1}^n \\left(y_i - \\left(\\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0 \\right) \\right)^2 + \\lambda \\sum_{j=1}^m |b_{\\alpha}|\n",
    "\\end{equation}\n",
    "\n",
    "As a result the lasso has 2 criteria:\n",
    "\n",
    "1. set the model parameters to minimize the error with training data\n",
    "\n",
    "2. shrink the estimates of the slope parameters towards zero. Note: the intercept is not affected by the lambda, $\\lambda$, hyperparameter.\n",
    "\n",
    "Note the only difference between the lasso and ridge regression is:\n",
    "\n",
    "* for the lasso the shrinkage term is posed as an $\\ell_1$ penalty ($\\lambda \\sum_{\\alpha=1}^m |b_{\\alpha}|$) \n",
    "\n",
    "* for ridge regression the shrinkage term is posed as an $\\ell_2$ penalty ($\\lambda \\sum_{\\alpha=1}^m \\left(b_{\\alpha}\\right)^2$).\n",
    "\n",
    "While both ridge regression and the lasso shrink the model parameters ($b_{\\alpha}, \\alpha = 1,\\ldots,m$) towards zero:\n",
    "\n",
    "* the lasso parameters reach zero at different rates for each predictor feature as the lambda, $\\lambda$, hyperparameter increases. \n",
    "\n",
    "* as a result the lasso provides a method for feature ranking and selection!\n",
    "\n",
    "The lambda, $\\lambda$, hyperparameter controls the degree of fit of the model and may be related to the model variance and bias trade-off.\n",
    "\n",
    "* for $\\lambda \\rightarrow 0$ the prediction model approaches linear regression, there is lower model bias, but the model variance is higher\n",
    "\n",
    "* as $\\lambda$ increases the model variance decreases and the model bias increases\n",
    "\n",
    "* for $\\lambda \\rightarrow \\infty$ the coefficients all become 0.0 and the model is the global mean\n",
    "\n",
    "#### Train / Test Split\n",
    "\n",
    "The available data is split into training and testing subsets.\n",
    "\n",
    "* in general 15-30% of the data is withheld from training to apply as testing data\n",
    "\n",
    "* testing data selection should be fair, the same difficulty of predictions (offset/different from the training dat\n",
    "\n",
    "#### Machine Learning Model Training\n",
    "\n",
    "The training data is applied to train the model parameters such that the model minimizes mismatch with the training data\n",
    "\n",
    "* it is common to use **mean square error** (known as a **L2 norm**) as a loss function summarizing the model mismatch\n",
    "\n",
    "* **miminizing the loss function** for simple models an anlytical solution may be available, but for most machine this requires an iterative optimization method to find the best model parameters\n",
    "\n",
    "This process is repeated over a range of model complexities specified by hyperparameters. \n",
    "\n",
    "#### Machine Learning Model Tuning\n",
    "\n",
    "The withheld testing data is retrieved and loss function (usually the **L2 norm** again) is calculated to summarize the error over the testing data\n",
    "\n",
    "* this is repeated over over the range of specified hypparameters\n",
    "\n",
    "* the model complexity / hyperparameters that minimize the loss function / error summary in testing is selected\n",
    "\n",
    "This is known are model hypparameter tuning.\n",
    "\n",
    "#### Machine Learning Model Overfit\n",
    "\n",
    "More model complexity/flexibility than can be justified with the available data, data accuracy, frequency and coverage\n",
    "\n",
    "* Model explains “idiosyncrasies” of the data, capturing data noise/error in the model\n",
    "\n",
    "* High accuracy in training, but low accuracy in testing / real-world use away from training data cases – poor ability of the model to generalize\n",
    "\n",
    "#### The Interactive Demonstrations\n",
    "\n",
    "Here's a simple workflow, demonstration of predictive machine learning model training and testing for overfit.  We use a:\n",
    "\n",
    "* simple polynomial model\n",
    "\n",
    "* 1 preditor feature and 1 response feature\n",
    "\n",
    "for an high interpretability model/ simple illustration.\n",
    "\n",
    "#### Workflow Goals\n",
    "\n",
    "Learn the basics of machine learning training, tuning for model generalization while avoiding model overfit. We use the very simple case of ridge regression that introduces a hyperparameter to linear regression.\n",
    "\n",
    "We consider 2 examples:\n",
    "\n",
    "1. **A linear model + noise** to make a random dataset with 1 predictor feature and 1 response feature. You can make the model, by-hand set the lambda hyperparameter and observe the impact. The code actually runs many lambda values so you can explore accurate and inacturate (over and underfit models).\n",
    "\n",
    "2. **A loaded multivariate subsurface dataset** with random resampling to explore uncertainty in your result. Like above you can set the lambda hyperparameter by-hand and observe the model accuracy in train and test over a range model flexibility. In this case, since the model is quite multidimensional, you can see the cross validation plot instead of the visualization of the model.\n",
    "\n",
    "#### Getting Started\n",
    "\n",
    "You will need to copy the following data files to your working directory.  They are available [here](https://github.com/GeostatsGuy/GeoDataSets):\n",
    "\n",
    "* [unconv_MV.csv](https://github.com/GeostatsGuy/GeoDataSets/blob/master/unconv_MV.csv)\n",
    "\n",
    "The dataset is available in this repository, https://github.com/GeostatsGuy/GeoDataSets. \n",
    "\n",
    "* download this file to your working directory\n",
    "\n",
    "#### Import Required Packages\n",
    "\n",
    "We will also need some standard packages. These should have been installed with Anaconda 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import geostatspy.GSLIB as GSLIB                       # GSLIB utilies, visualization and wrapper\n",
    "import geostatspy.geostats as geostats                 # GSLIB methods convert to Python    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also need some standard packages. These should have been installed with Anaconda 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import os                                               # to set current working directory \n",
    "import sys                                              # supress output to screen for interactive variogram modeling\n",
    "import io\n",
    "import numpy as np                                      # arrays and matrix math\n",
    "import pandas as pd                                     # DataFrames\n",
    "import matplotlib.pyplot as plt                         # plotting\n",
    "from sklearn.model_selection import train_test_split    # train and test split\n",
    "from sklearn.metrics import mean_squared_error          # model error calculation\n",
    "from sklearn.linear_model import Ridge                  # ridge regression\n",
    "from sklearn.linear_model import Lasso                  # the lasso implemented in scikit learn\n",
    "import scipy                                            # kernel density estimator for PDF plot\n",
    "from matplotlib.pyplot import cm                        # color maps\n",
    "from ipywidgets import interactive                      # widgets and interactivity\n",
    "from ipywidgets import widgets                            \n",
    "from ipywidgets import Layout\n",
    "from ipywidgets import Label\n",
    "from ipywidgets import VBox, HBox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Demonstration 1, Simple Linear Model + Noise\n",
    "\n",
    "Let's build the code and dashboard in one block for concisenss. I have other examples to cover basics if you need that.\n",
    "\n",
    "#### Build the Interactive Dashboard\n",
    "\n",
    "The following code:\n",
    "\n",
    "* makes a random dataset, change the random number seed and number of data for a different dataset\n",
    "* loops over polygonal fits of 1st-12th order, loops over mulitple realizations and calculates the average MSE and P10 and P90 vs. order\n",
    "* calculates a specific model example\n",
    "* plots the example model with training and testing data, the error distributions and the MSE envelopes vs. complexity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "text_trap = io.StringIO()\n",
    "sys.stdout = text_trap\n",
    "\n",
    "l = widgets.Text(value='                            Machine Learning LASSO Regression Hyperparameter Tuning Demo, Prof. Michael Pyrcz, The University of Texas at Austin',\n",
    "                 layout=Layout(width='950px', height='30px'))\n",
    "\n",
    "n = widgets.IntSlider(min=5, max = 200, value=30, step = 1, description = 'n',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
    "split = widgets.FloatSlider(min=0.05, max = .95, value=0.20, step = 0.05, description = 'Test %',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)\n",
    "std = widgets.FloatSlider(min=0, max = 200, value=0, step = 1.0, description = 'Noise StDev',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)\n",
    "lam = widgets.FloatLogSlider(min=-5.0, max = 5.0, value=1,base=10,step = 0.2,description = 'Regularization',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
    "\n",
    "ui = widgets.HBox([n,split,std,lam],)\n",
    "ui2 = widgets.VBox([l,ui],)\n",
    "\n",
    "def run_plot(n,split,std,lam):\n",
    "    seed = 13014; nreal = 20; slope = 20.0; intercept = 50.0\n",
    "    np.random.seed(seed) # seed the random number generator\n",
    "    lam_min = 1.0E-5;lam_max = 1.0e5\n",
    "    \n",
    "    # make the datastet\n",
    "    X_seq = np.linspace(0.0,100,100)\n",
    "    X = np.random.rand(n)*20\n",
    "    y = X*slope + intercept # assume linear + error\n",
    "    y = y + np.random.normal(loc = 0.0,scale=std,size=n) # add noise\n",
    "    \n",
    "    # calculate the MSE train and test over a range of complexity over multiple realizations of test/train split\n",
    "    clam_list = np.logspace(-5,5,20,base=10.0)\n",
    "    #print(clam_list)\n",
    "    cmse_train = np.zeros([len(clam_list),nreal]); cmse_test = np.zeros([len(clam_list),nreal])\n",
    "    cmse_truth = np.zeros([len(clam_list),nreal])\n",
    "    for j in range(0,nreal):\n",
    "        for i, clam in enumerate(clam_list):\n",
    "            #print('lam' + str(clam))\n",
    "            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=split, random_state=seed+j)\n",
    "            n_train = len(X_train); n_test = len(X_test)\n",
    "            lasso_reg = Lasso(alpha=clam)\n",
    "            #print(X_train.reshape(n_train,1))\n",
    "            lasso_reg.fit(X_train.reshape(n_train,1),y_train)\n",
    "            #print('here')\n",
    "            y_pred_train = lasso_reg.predict(X_train.reshape(n_train,1))\n",
    "            y_pred_test = lasso_reg.predict(X_test.reshape(n_test,1))\n",
    "            y_pred_truth = lasso_reg.predict(X_seq.reshape(len(X_seq),1))\n",
    "            y_truth = X_seq*slope + intercept\n",
    "            cmse_train[i,j] = mean_squared_error(y_train, y_pred_train)\n",
    "            cmse_test[i,j] = mean_squared_error(y_test, y_pred_test)\n",
    "            cmse_truth[i,j] = mean_squared_error(y_truth, y_pred_truth)\n",
    "    # summarize over the realizations\n",
    "    cmse_train_avg = cmse_train.mean(axis=1)\n",
    "    cmse_test_avg = cmse_test.mean(axis=1)\n",
    "    cmse_truth_avg = cmse_truth.mean(axis=1)\n",
    "    cmse_train_high = np.percentile(cmse_train,q=90,axis=1)\n",
    "    cmse_train_low = np.percentile(cmse_train,q=10,axis=1)        \n",
    "    cmse_test_high = np.percentile(cmse_test,q=90,axis=1)\n",
    "    cmse_test_low = np.percentile(cmse_test,q=10,axis=1)\n",
    "    cmse_truth_high = np.percentile(cmse_truth,q=90,axis=1)\n",
    "    cmse_truth_low = np.percentile(cmse_truth,q=10,axis=1)\n",
    "    \n",
    "#     cmse_train_high = np.amax(cmse_train,axis=1)\n",
    "#     cmse_train_low = np.amin(cmse_train,axis=1)        \n",
    "#     cmse_test_high = np.amax(cmse_test,axis=1)\n",
    "#     cmse_test_low = np.amin(cmse_test,axis=1)\n",
    "    \n",
    "    # build the one model example to show\n",
    "    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=split, random_state=seed)\n",
    "    n_train = len(X_train); n_test = len(X_test)\n",
    "    lasso_reg = Lasso(alpha=lam)\n",
    "    lasso_reg.fit(X_train.reshape(n_train,1),y_train)\n",
    "    y_pred_train = lasso_reg.predict(X_train.reshape(n_train,1))\n",
    "    y_pred_test = lasso_reg.predict(X_test.reshape(n_test,1))\n",
    "    y_pred_truth = lasso_reg.predict(X_seq.reshape(len(X_seq),1))\n",
    "     \n",
    "    # calculate error\n",
    "    error_seq = np.linspace(-100.0,100.0,100)\n",
    "    error_train = y_pred_train - y_train\n",
    "    error_test = y_pred_test - y_test\n",
    "    error_truth = X_seq*slope + intercept - y_truth\n",
    "    mse_train = mean_squared_error(y_train,y_pred_train)\n",
    "    mse_test = mean_squared_error(y_test,y_pred_test)\n",
    "    mse_truth = mean_squared_error(y_truth,y_pred_truth)\n",
    "    \n",
    "    error_train_std = np.std(error_train)\n",
    "    error_test_std = np.std(error_test)\n",
    "    error_truth_std = np.std(error_truth)\n",
    "    \n",
    "    #kde_error_train = scipy.stats.gaussian_kde(error_train)\n",
    "    #kde_error_test = scipy.stats.gaussian_kde(error_test)\n",
    "    \n",
    "    plt.subplot(131)\n",
    "    plt.plot(X_seq, lasso_reg.predict(X_seq.reshape(len(X_seq),1)), color=\"black\")\n",
    "    plt.plot(X_seq, X_seq*slope+intercept, color=\"black\",alpha = 0.2)\n",
    "    plt.title(\"Ridge Regression Model, Lambda = \"+str(round(lam,2)))\n",
    "    plt.scatter(X_train,y_train,c =\"red\",alpha=0.2,edgecolors=\"black\")\n",
    "    plt.scatter(X_test,y_test,c =\"blue\",alpha=0.2,edgecolors=\"black\")\n",
    "    plt.ylim([0,500]); plt.xlim([0,20]); plt.grid()\n",
    "    plt.xlabel('Porosity (%)'); plt.ylabel('Permeability (mD)')\n",
    "    \n",
    "    plt.subplot(132)\n",
    "    plt.hist(error_train, facecolor='red',bins=np.linspace(-200.0,200.0,10),alpha=0.2,density=True,edgecolor='black',label='Train')\n",
    "    plt.hist(error_test, facecolor='blue',bins=np.linspace(-200.0,200.0,10),alpha=0.2,density=True,edgecolor='black',label='Test')\n",
    "    #plt.plot(error_seq,kde_error_train(error_seq),lw=2,label='Train',c='red')\n",
    "    #plt.plot(error_seq,kde_error_test(error_seq),lw=2,label='Test',c='blue')   \n",
    "    #plt.xlim([-55.0,55.0]); \n",
    "    plt.ylim([0,0.1])\n",
    "    plt.xlabel('Model Error'); plt.ylabel('Frequency'); plt.title('Training and Testing Error, Lambda = '+str((round(lam,2))))\n",
    "    plt.legend(loc='upper left')\n",
    "    plt.grid(True)\n",
    "    \n",
    "    plt.subplot(133); ax = plt.gca()\n",
    "    plt.plot(clam_list,cmse_train_avg,lw=2,label='Train',c='red')\n",
    "    ax.fill_between(clam_list,cmse_train_high,cmse_train_low,facecolor='red',alpha=0.05)\n",
    " \n",
    "    plt.plot(clam_list,cmse_test_avg,lw=2,label='Test',c='blue')  \n",
    "    ax.fill_between(clam_list,cmse_test_high,cmse_test_low,facecolor='blue',alpha=0.05)\n",
    "    \n",
    "    plt.plot(clam_list,cmse_truth_avg,lw=2,label='Truth',c='black')  \n",
    "    ax.fill_between(clam_list,cmse_truth_high,cmse_truth_low,facecolor='black',alpha=0.05)\n",
    "    \n",
    "    plt.xscale('log'); plt.xlim([lam_max,lam_min]); plt.yscale('log'); #plt.ylim([10,10000])\n",
    "    plt.xlabel('Model Flexibility by Regularization, Lambda'); plt.ylabel('Mean Square Error'); plt.title('Training and Testing Error vs. Model Complexity')\n",
    "    plt.legend(loc='upper left')\n",
    "    plt.grid(True)\n",
    "    \n",
    "    plt.plot([lam,lam],[1.0e-5,1.0e5],c = 'black',linewidth=3,alpha = 0.8)\n",
    "    \n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.6, wspace=0.2, hspace=0.3)\n",
    "    plt.show()\n",
    "    \n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot = widgets.interactive_output(run_plot, {'n':n,'split':split,'std':std,'lam':lam})\n",
    "interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive Machine Learning LASSO Regression Hyperparameter Tuning Demonstation \n",
    "\n",
    "#### Michael Pyrcz, Associate Professo, University of Texas at Austin \n",
    "\n",
    "Change the number of sample data, train/test split and the data noise and observe hyperparameter tuning! Change the regularization hyperparameter, lambda, to observe a specific model example.  Based on a simple, linear truth model + noise.\n",
    "\n",
    "### The Inputs\n",
    "\n",
    "* **n** - number of data\n",
    "* **Test %** - percentage of sample data withheld as testing data\n",
    "* **Noise StDev** - standard deviation of random Gaussian error added to the data\n",
    "* **Regularization Hyperparameter** - the lambda coefficient, weight in loss the function on minimization of the  model parameters "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(ui2, interactive_plot)                           # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Observations\n",
    "\n",
    "What did we learn? \n",
    "\n",
    "* regularization reduces the model sensitivity to the data, this results in reduced model variance\n",
    "* regularization is helpful with sparse data with noise\n",
    "\n",
    "When the noise is removed from the dataset, regularization increase / flexibility and sensitivity reduction reduces the quality of the model. \n",
    "\n",
    "Let's check out a more extreme example to really see this effect.  \n",
    "\n",
    "* we can enhance data sparcity by increasing the problem dimensionality\n",
    "* we will use more realistic, noisy data\n",
    "\n",
    "### Demonstration 2: Multivariate, Realistic Dataset\n",
    "\n",
    "We have to load data, set set the directory to the location that you have placed this dataset.\n",
    "\n",
    "* [unconv_MV.csv](https://github.com/GeostatsGuy/GeoDataSets/blob/master/unconv_MV.csv)\n",
    "\n",
    "The dataset is available in this repository, https://github.com/GeostatsGuy/GeoDataSets. \n",
    "\n",
    "* download this file to your working directory\n",
    "\n",
    "#### Set the working directory\n",
    "\n",
    "I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time).  Also, in this case make sure to place the required (see below) data file in this working directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.chdir(\"d:\\PGE337\")                                       # set the working directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load the data\n",
    "\n",
    "The following loads the data into a DataFrame and then previews the first 5 samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>WellIndex</th>\n",
       "      <th>Por</th>\n",
       "      <th>LogPerm</th>\n",
       "      <th>AI</th>\n",
       "      <th>Brittle</th>\n",
       "      <th>TOC</th>\n",
       "      <th>VR</th>\n",
       "      <th>Production</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>15.91</td>\n",
       "      <td>1.67</td>\n",
       "      <td>3.06</td>\n",
       "      <td>14.05</td>\n",
       "      <td>1.36</td>\n",
       "      <td>1.85</td>\n",
       "      <td>177.381958</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15.34</td>\n",
       "      <td>1.65</td>\n",
       "      <td>2.60</td>\n",
       "      <td>31.88</td>\n",
       "      <td>1.37</td>\n",
       "      <td>1.79</td>\n",
       "      <td>1479.767778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>20.45</td>\n",
       "      <td>2.02</td>\n",
       "      <td>3.13</td>\n",
       "      <td>63.67</td>\n",
       "      <td>1.79</td>\n",
       "      <td>2.53</td>\n",
       "      <td>4421.221583</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>11.95</td>\n",
       "      <td>1.14</td>\n",
       "      <td>3.90</td>\n",
       "      <td>58.81</td>\n",
       "      <td>0.40</td>\n",
       "      <td>2.03</td>\n",
       "      <td>1488.317629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>19.53</td>\n",
       "      <td>1.83</td>\n",
       "      <td>2.57</td>\n",
       "      <td>43.75</td>\n",
       "      <td>1.40</td>\n",
       "      <td>2.11</td>\n",
       "      <td>5261.094919</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   WellIndex    Por  LogPerm    AI  Brittle   TOC    VR   Production\n",
       "0          1  15.91     1.67  3.06    14.05  1.36  1.85   177.381958\n",
       "1          2  15.34     1.65  2.60    31.88  1.37  1.79  1479.767778\n",
       "2          3  20.45     2.02  3.13    63.67  1.79  2.53  4421.221583\n",
       "3          4  11.95     1.14  3.90    58.81  0.40  2.03  1488.317629\n",
       "4          5  19.53     1.83  2.57    43.75  1.40  2.11  5261.094919"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_mv = pd.read_csv(\"unconv_MV.csv\") \n",
    "df_mv.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check the data\n",
    "\n",
    "We should do much more data investigation. I do this in most of my workflows, but for this one, let's just calculate the summary statistics and call it a day! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>WellIndex</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>500.500000</td>\n",
       "      <td>288.819436</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>250.75000</td>\n",
       "      <td>500.50000</td>\n",
       "      <td>750.250000</td>\n",
       "      <td>1000.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Por</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>14.950460</td>\n",
       "      <td>3.029634</td>\n",
       "      <td>5.400000</td>\n",
       "      <td>12.85750</td>\n",
       "      <td>14.98500</td>\n",
       "      <td>17.080000</td>\n",
       "      <td>24.65000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LogPerm</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>1.398880</td>\n",
       "      <td>0.405966</td>\n",
       "      <td>0.120000</td>\n",
       "      <td>1.13000</td>\n",
       "      <td>1.39000</td>\n",
       "      <td>1.680000</td>\n",
       "      <td>2.58000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AI</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>2.982610</td>\n",
       "      <td>0.577629</td>\n",
       "      <td>0.960000</td>\n",
       "      <td>2.57750</td>\n",
       "      <td>3.01000</td>\n",
       "      <td>3.360000</td>\n",
       "      <td>4.70000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brittle</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>49.719480</td>\n",
       "      <td>15.077006</td>\n",
       "      <td>-10.500000</td>\n",
       "      <td>39.72250</td>\n",
       "      <td>49.68000</td>\n",
       "      <td>59.170000</td>\n",
       "      <td>93.47000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TOC</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>1.003810</td>\n",
       "      <td>0.504978</td>\n",
       "      <td>-0.260000</td>\n",
       "      <td>0.64000</td>\n",
       "      <td>0.99500</td>\n",
       "      <td>1.360000</td>\n",
       "      <td>2.71000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VR</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>1.991170</td>\n",
       "      <td>0.308194</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>1.81000</td>\n",
       "      <td>2.00000</td>\n",
       "      <td>2.172500</td>\n",
       "      <td>2.90000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Production</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>2247.295809</td>\n",
       "      <td>1464.256312</td>\n",
       "      <td>2.713535</td>\n",
       "      <td>1191.36956</td>\n",
       "      <td>1976.48782</td>\n",
       "      <td>3023.594214</td>\n",
       "      <td>12568.64413</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             count         mean          std        min         25%  \\\n",
       "WellIndex   1000.0   500.500000   288.819436   1.000000   250.75000   \n",
       "Por         1000.0    14.950460     3.029634   5.400000    12.85750   \n",
       "LogPerm     1000.0     1.398880     0.405966   0.120000     1.13000   \n",
       "AI          1000.0     2.982610     0.577629   0.960000     2.57750   \n",
       "Brittle     1000.0    49.719480    15.077006 -10.500000    39.72250   \n",
       "TOC         1000.0     1.003810     0.504978  -0.260000     0.64000   \n",
       "VR          1000.0     1.991170     0.308194   0.900000     1.81000   \n",
       "Production  1000.0  2247.295809  1464.256312   2.713535  1191.36956   \n",
       "\n",
       "                   50%          75%          max  \n",
       "WellIndex    500.50000   750.250000   1000.00000  \n",
       "Por           14.98500    17.080000     24.65000  \n",
       "LogPerm        1.39000     1.680000      2.58000  \n",
       "AI             3.01000     3.360000      4.70000  \n",
       "Brittle       49.68000    59.170000     93.47000  \n",
       "TOC            0.99500     1.360000      2.71000  \n",
       "VR             2.00000     2.172500      2.90000  \n",
       "Production  1976.48782  3023.594214  12568.64413  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_mv.describe().transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Build the Interactive Dashboard\n",
    "\n",
    "The following code:\n",
    "\n",
    "* makes a random sample of the dataset\n",
    "* loops over ridge regression fits varying the lambda between 1.0e-5 to 1.0e5, loops over mulitple realizations and calculates the average MSE and P10 and P90 vs. lambda.\n",
    "* calculates specified model cross validation\n",
    "* plots the example model with training and testing data, the error distributions and the MSE envelopes vs. complexity/inverse of lambda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import warnings\n",
    "# warnings.filterwarnings('ignore')\n",
    "\n",
    "# text_trap = io.StringIO()\n",
    "# sys.stdout = text_trap\n",
    "\n",
    "l = widgets.Text(value='                          Machine Learning LASSO Regression Hyperparameter Tuning Demo, Prof. Michael Pyrcz, The University of Texas at Austin',\n",
    "                 layout=Layout(width='950px', height='30px'))\n",
    "\n",
    "n = widgets.IntSlider(min=5, max = 200, value=30, step = 1, description = 'n',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
    "split = widgets.FloatSlider(min=0.05, max = .95, value=0.20, step = 0.05, description = 'Test %',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)\n",
    "lam = widgets.FloatLogSlider(min=-5.0, max = 5.0, value=1,base=10,step = 0.2,description = 'Regularization',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
    "\n",
    "ui = widgets.HBox([n,split,lam],)\n",
    "ui2 = widgets.VBox([l,ui],)\n",
    "\n",
    "def run_plot(n,split,lam):\n",
    "    seed = 13014; nreal = 20; slope = 20.0; intercept = 50.0\n",
    "    np.random.seed(seed) # seed the random number generator\n",
    "    lam_min = 1.0E-5;lam_max = 1.0e5\n",
    "    \n",
    "    df_sample_mv = df_mv.sample(n=n)\n",
    "    df_X_mv = df_sample_mv[['Por','LogPerm','AI','Brittle','TOC','VR']]\n",
    "    df_y_mv = df_sample_mv[['Production']]\n",
    "    \n",
    "    # calculate the MSE train and test over a range of complexity over multiple realizations of test/train split\n",
    "    clam_list = np.logspace(-5,5,20,base=10.0)\n",
    "    #print(clam_list)\n",
    "    cmse_train = np.zeros([len(clam_list),nreal]); cmse_test = np.zeros([len(clam_list),nreal])\n",
    "    coefs = np.zeros([len(clam_list),nreal,6])\n",
    "    for j in range(0,nreal):\n",
    "        for i, clam in enumerate(clam_list):\n",
    "            #print('lam' + str(clam))\n",
    "            df_X_mv_train, df_X_mv_test, df_y_mv_train, df_y_mv_test = train_test_split(df_X_mv, df_y_mv, test_size=split, random_state=seed+j)\n",
    "            n_train = len(df_X_mv_train); n_test = len(df_X_mv_test)\n",
    "            lasso_reg = Lasso(alpha=clam)\n",
    "            #print(X_train.reshape(n_train,1))\n",
    "            lasso_reg.fit(df_X_mv_train.values,df_y_mv_train.values)\n",
    "            coefs[i,j,:] = lasso_reg.coef_ \n",
    "            #print('here')\n",
    "            y_pred_train = lasso_reg.predict(df_X_mv_train.values)\n",
    "            y_pred_test = lasso_reg.predict(df_X_mv_test.values)\n",
    "            cmse_train[i,j] = mean_squared_error(df_y_mv_train.values, y_pred_train)\n",
    "            cmse_test[i,j] = mean_squared_error(df_y_mv_test.values, y_pred_test)\n",
    "    # summarize over the realizations\n",
    "    cmse_train_avg = cmse_train.mean(axis=1)\n",
    "    cmse_test_avg = cmse_test.mean(axis=1)\n",
    "    cmse_train_high = np.percentile(cmse_train,q=90,axis=1)\n",
    "    cmse_train_low = np.percentile(cmse_train,q=10,axis=1)        \n",
    "    cmse_test_high = np.percentile(cmse_test,q=90,axis=1)\n",
    "    cmse_test_low = np.percentile(cmse_test,q=10,axis=1)\n",
    "    \n",
    "    coefs_avg = coefs.mean(axis=1)\n",
    "    coefs_high = np.percentile(coefs,q=75,axis=1)\n",
    "    coefs_low = np.percentile(coefs,q=25,axis=1)\n",
    "    \n",
    "#     cmse_train_high = np.amax(cmse_train,axis=1)\n",
    "#     cmse_train_low = np.amin(cmse_train,axis=1)        \n",
    "#     cmse_test_high = np.amax(cmse_test,axis=1)\n",
    "#     cmse_test_low = np.amin(cmse_test,axis=1)\n",
    "    \n",
    "    # build the one model example to show\n",
    "    \n",
    "    df_X_mv_train, df_X_mv_test, df_y_mv_train, df_y_mv_test = train_test_split(df_X_mv, df_y_mv, test_size=split, random_state=seed+j)\n",
    "    n_train = len(df_X_mv_train); n_test = len(df_X_mv_test)\n",
    "    lasso_reg = Lasso(alpha=lam)\n",
    "    lasso_reg.fit(df_X_mv_train.values,df_y_mv_train.values)\n",
    "    y_pred_train = lasso_reg.predict(df_X_mv_train.values)\n",
    "    y_pred_test = lasso_reg.predict(df_X_mv_test.values)\n",
    "        \n",
    "    # calculate error\n",
    "    error_seq = np.linspace(-100.0,100.0,100)\n",
    "    error_train = y_pred_train - df_y_mv_train.values\n",
    "    error_test = y_pred_test - df_y_mv_test.values\n",
    "    mse_train = mean_squared_error(df_y_mv_train.values,y_pred_train)\n",
    "    mse_test = mean_squared_error(df_y_mv_test.values,y_pred_test)\n",
    "\n",
    "    error_train_std = np.std(error_train)\n",
    "    error_test_std = np.std(error_test)\n",
    "    \n",
    "    #kde_error_train = scipy.stats.gaussian_kde(error_train)\n",
    "    #kde_error_test = scipy.stats.gaussian_kde(error_test)\n",
    "    \n",
    "    plt.subplot(131)\n",
    "    plt.plot([0,8000],[0,8000],c = 'black',linewidth=3,alpha = 0.4)\n",
    "    plt.scatter(df_y_mv_train.values,y_pred_train,color=\"red\",edgecolors='black',label='Train',alpha=0.2)\n",
    "    plt.scatter(df_y_mv_test.values,y_pred_test,color=\"blue\",edgecolors='black',label='Test',alpha=0.2)\n",
    "    plt.title(\"Cross Validation, Lambda = \"+str(round(lam,2)))\n",
    "    plt.ylim([0,5000]); plt.xlim([0,5000]); plt.grid(); plt.legend(loc = 'upper left')\n",
    "    plt.xlabel('True Response'); plt.ylabel('Estimated Response')\n",
    "        \n",
    "    plt.subplot(132); ax = plt.gca()\n",
    "#     plt.hist(error_train, facecolor='red',bins=np.linspace(-2000.0,2000.0,10),alpha=0.2,density=False,edgecolor='black',label='Train')\n",
    "#     plt.hist(error_test, facecolor='blue',bins=np.linspace(-2000.0,2000.0,10),alpha=0.2,density=False,edgecolor='black',label='Test')\n",
    "#     #plt.plot(error_seq,kde_error_train(error_seq),lw=2,label='Train',c='red')\n",
    "#     #plt.plot(error_seq,kde_error_test(error_seq),lw=2,label='Test',c='blue')   \n",
    "#     #plt.xlim([-55.0,55.0]); \n",
    "#     plt.ylim([0,10])\n",
    "#     plt.xlabel('Model Error'); plt.ylabel('Frequency'); plt.title('Training and Testing Error, Lambda = '+str((round(lam,2))))\n",
    "#     plt.legend(loc='upper left')\n",
    "#     plt.grid(True)\n",
    "    color = ['black','blue','green','red','orange','grey']                                           # plot the results\n",
    "    for ifeature in range(0,6):\n",
    "        plt.semilogx(clam_list,coefs_avg[:,ifeature], label = df_mv.columns[ifeature+1], c = color[ifeature], linewidth = 3.0)\n",
    "        ax.fill_between(clam_list,coefs_high[:,ifeature],coefs_low[:,ifeature],facecolor=color[ifeature],alpha=0.1)\n",
    "    plt.title('Standardized Model Coefficients vs. Lambda Hyperparameter'); plt.xlabel('Lambda Hyperparameter'); plt.ylabel('Standardized Model Coefficients')\n",
    "    plt.xlim(lam_max,lam_min); plt.grid(); plt.legend(loc = 'lower right')\n",
    "    \n",
    "    plt.subplot(133); ax = plt.gca()\n",
    "    plt.plot(clam_list,cmse_train_avg,lw=2,label='Train',c='red')\n",
    "    ax.fill_between(clam_list,cmse_train_high,cmse_train_low,facecolor='red',alpha=0.05)\n",
    " \n",
    "    plt.plot(clam_list,cmse_test_avg,lw=2,label='Test',c='blue')  \n",
    "    ax.fill_between(clam_list,cmse_test_high,cmse_test_low,facecolor='blue',alpha=0.05)\n",
    "    \n",
    "    plt.xscale('log'); plt.xlim([lam_max,lam_min]); plt.yscale('log'); plt.ylim([1.0e5,1.0e7])\n",
    "    plt.xlabel('Model Flexibility by Regularization, Lambda'); plt.ylabel('Mean Square Error'); plt.title('Training and Testing Error vs. Model Complexity')\n",
    "    plt.legend(loc='upper left')\n",
    "    plt.grid(True)\n",
    "    \n",
    "    plt.plot([lam,lam],[1.0e5,1.0e7],c = 'black',linewidth=3,alpha = 0.8)\n",
    "    \n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.5, top=1.0, wspace=0.2, hspace=0.3)\n",
    "    plt.show()\n",
    "    \n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot = widgets.interactive_output(run_plot, {'n':n,'split':split,'lam':lam})\n",
    "interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive Machine Learning LASSO Regression Hyperparameter Tuning Demonstation \n",
    "\n",
    "#### Michael Pyrcz, Associate Professo, University of Texas at Austin \n",
    "\n",
    "Change the number of sample data, train/test split and the data noise and observe hyperparameter tuning! Change the regularization hyperparameter, lambda, to observe a specific model example. Based on a more complicated (6 predictor features, some non-linearity and sampling error).\n",
    "\n",
    "### The Inputs\n",
    "\n",
    "* **n** - number of data\n",
    "* **Test %** - percentage of sample data withheld as testing data\n",
    "* **Regularization Hyperparameter** - the lambda coefficient, weight in loss the function on minimization of the  model parameters "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "35183ef7d64d4ae1968154765ad41778",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Text(value='                          Machine Learning LASSO Regression Hyperparameter Tuning D…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "09fd0ca31601495390403f9d17167b1a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(ui2, interactive_plot)                           # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Comments\n",
    "\n",
    "This was a basic demonstration of machine learning model training and tuning, with ridge regression. I have many other demonstrations and even basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n",
    "  \n",
    "#### The Author:\n",
    "\n",
    "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
    "\n",
    "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
    "\n",
    "For more about Michael check out these links:\n",
    "\n",
    "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
    "\n",
    "#### Want to Work Together?\n",
    "\n",
    "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
    "\n",
    "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
    "\n",
    "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
    "\n",
    "* I can be reached at mpyrcz@austin.utexas.edu.\n",
    "\n",
    "I'm always happy to discuss,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "Michael Pyrcz, Ph.D., P.Eng. Associate Professor, Cockrell School of Engineering, Bureau of Economic Geology, and The Jackson School of Geosciences, The University of Texas at Austin\n",
    "\n",
    "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)  \n",
    "  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
